Extracting Syntax Statistics from Large Corpora of Written English

نویسنده

  • Douglas L. T. Rohde
چکیده

The field of linguistics has seen a growing interest in the statistics of everyday language. In studying how we acquire language and why some of its aspects are more difficult for us than others, it is critical to understand the linguistic environment to which we are exposed. However, gathering statistics over syntactic structures, even with a syntactically tagged corpus, can be difficult and time consuming. This report describes a partially automated method that alleviates many of the problems associated with gathering syntax statistics from parsed corpora. The method is then used to analyze a variety of structures in the Wall Street Journal and Brown corpora of written English. These structures include verb phrases, relative clauses, sentential noun phrases, prepositional phrases, and coordinate and subordinate clauses.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Lexical Bundles in English Abstracts of Research Articles Written by Iranian Scholars: Examples from Humanities

This paper investigates a special type of recurrent expressions, lexical bundles, defined as a sequence of three or more words that co-occur frequently in a particular register (Biber et al., 1999). Considering the importance of this group of multi-word sequences in academic prose, this study explores the forms and syntactic structures of three- and four-word bundles in English abstracts writte...

متن کامل

Move Structures in “Statement-of-the-Problem” Sections of M.A. Theses: The Case of Native and Nonnative Speakers of English

Understanding how to structure the “Statement-of-the-Problem” (SP) section of a thesis is necessary for EFL students to develop a logical argumentation for a problem statement. This study intended to compare Move structures of SP sections of theses written by native speakers of Persian (NSPs) and English (NSEs). To this end, 100 SP sections (50 SP sections written by NSE...

متن کامل

Metadiscourse Elements in English Research Articles Written by Native English and Non-native Iranian Writers in Applied Linguistics and Civil Engineering

This study investigated metadiscourse and its subcategories in English research articles (RAs) written by nonnative (Iranian) and native English writers from the two disciplines of applied linguistics and civil engineering. The study aimed at seeing whether language and discipline influenced the frequency of occurrence of metadiscourse elements in research articles. To this end, a sample of 120...

متن کامل

Mining Parenthetical Translations for Polish-English Lexica

Documents written in languages other than English sometimes include parenthetical English translations, usually for technical and scienti c terminology. Techniques had been developed for extracting such translations (as well as transliterations) from large Chinese text corpora. This paper presents methods for mining parenthetical translation in Polish texts. The main di erence between translati...

متن کامل

A Comparative Analysis of Epistemic and Root Modality in Two selected English Books in the Field of Applied Linguistics Written by English Native and Iranian Non-native Writers

Academic discourse has always been the focus of many linguists, especially those who have been involved with English for Academic Purposes (EAP) and discourse analysis. Persuasion, as part of rhetorical structure of academic writing, is partly achieved by employing modality markers.  Adopting a descriptive design, the present study was carried out to compare the use of modality markers in terms...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000